Voice activity detector for calculating spectral irregularity measure on the basis of spectral difference measurements

ABSTRACT

A voice activity detector suitable for deployment in a mobile phone apparatus is disclosed. An advantage of the voice activity detector is that it is better able to provide a decision ( 79 ) as to whether an input signal ( 19 ) consists of noise (which it is not desired to transmit) or comprises speech or information tones (which are required to be transmitted), especially in noisy environments. The voice activity detector includes a number of components, in particular an auxiliary voice activity detector ( 3 ). The auxiliary voice activity detector ( 3 ) distinguishes between noise and speech on the basis that the spectrum of speech changes more rapidly than that of noise. This results in the auxiliary detector ( 3 ) rarely mistaking a speech signal to be a noise signal. Hence, a very reliable noise template ( 421 ) is obtained. For this reason, the auxiliary detector ( 3 ) is also useful in noise reduction applications. The voice activity detector also uses a neural net classifier ( 7 ).

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a voice activity detector. It hasparticular utility in relation to an auxiliary voice activity detectorcomprised in a main voice activity detector and also when comprised in anoise reduction apparatus. A main voice activity detector incorporatingsuch an auxiliary voice detector is especially suitable for use inmobile phones which may be required to operate in noisy environments.

2. Description of Related Art

Because of the limited regions of the electromagnetic spectrum whichhave been made available for use by cellular radio systems, the stronggrowth in the number of mobile phone users over the last decade hasmeant that cellular radio equipment suppliers have had to find ways toincrease the efficiency with which the available electromagneticspectrum is utilised.

One way in which this aim can be achieved is to reduce the size of thecells within the cellular radio system. However, it is found that cellsize can only be reduced by so much before the level of interferencefrom nearby cells (co-channel interference) becomes unacceptably high.In order to reduce co-channel interference, a technique calleddiscontinuous transmission is used. This technique involves arrangingthe mobile phone to transmit speech-representing signals only when themobile phone user is speaking and is based on the observation that, in agiven conversation, it is usual for only one of the parties to speak atany one time. By implementing discontinuous transmission, the averagelevel of co-channel interference can be reduced. This, in turn, meansthat the cell size in the system can be reduced and hence that thesystem can support more subscribers.

Another advantage of only transmitting sound-representing signals whenthe mobile phone user is speaking is that the lifetime of the electricbattery within the mobile phone handset is increased.

A voice activity detector is used to enable discontinuous transmission.The purpose of such a detector is to indicate whether a given signalconsists only of noise, or whether the signal comprises speech. If thevoice activity detector indicates that the signal to be transmittedconsists only of noise, then the signal is not transmitted.

Many mobile phones today use a voice activity detector similar to thatdescribed in European Patent No. 335521. In the voice activity detectordescribed therein, the similarity between the spectrum of an inputsound-representing signal and the spectrum of a noise signal ismeasured. The noise spectrum to be used in this comparison is obtainedfrom earlier portions of the input signal which were determined to benoise. That judgement is made by an auxiliary voice activity detectorwhich forms a component of the main voice activity detector. Since it isimportant that signals comprising speech are transmitted by the mobilephone and since the decision of the main voice activity detector isbased on signals identified as noise by the auxiliary voice detector, itis desirable that the auxiliary voice detector tends, in borderlinesituations, towards a determination that the signal comprises speech.The proportion of a conversation which is identified as speech by avoice activity detector is called the voice activity factor (or simply“activity”) of the detector. The proportion of conversation which infact comprises speech is typically in the range 35% to 40%. So, ideally,a main voice activity detector will have art activity lying within thisrange or slightly above it, whereas an auxiliary voice activity detectorcan have a significantly higher activity.

Although the known voice activity detectors exhibit good performance ina variety of environments, their performance has been found to be poorin noisy environments. A mobile phone may be required to operate incars, in city streets, in busy offices, in train stations or inairports. There is therefore a requirement for a voice activity detectorthat can operate reliably in noisy environments.

BRIEF SUMMARY OF THE INVENTION

According to the first aspect of the present invention there is provideda voice activity detector comprising:

means arranged in operation to calculate at least one first spectraldifference measure indicative of the degree of spectral similarity in apair of time segments of a signal, one of the time segments of the pairlagging the other by a first time interval;

means arranged in operation to calculate at least one second spectraldifference measure indicative of the degree of spectral similarity in apair of time segments of a signal, one of the time segments of the pairlagging the other by a second time interval which differs from saidfirst time interval;

means arranged in operation to calculate a spectral irregularity measureon the basis of at least said first and second spectral differencemeasures; and

means arranged in operation to compare said spectral irregularitymeasure with a threshold measure.

This voice activity detector has the advantage that it provides areliable determination that an input signal consists of noise. As statedabove, this is a desirable property for an auxiliary voice activitydetector which is used to identify signals which are used as noisetemplates in other processes carried out in an apparatus. Also, bycombining spectral difference measures derived in relation to differenttime intervals, a voice activity detector according to the presentinvention takes into account the degree of stationarity of the signalover different time intervals. For example, if a first spectraldifference measure were to be calculated in relation to a firstrelatively long time interval and a second spectral difference measurewere to be calculated in relation to a relatively short time interval,then both the short-term and long-term stationarity of the signal wouldinfluence a spectral irregularity measure which combines the first andsecond spectral difference measures. Since the spectrum of noise, unlikespeech, is stationary at least over time intervals ranging from 80 ms to1 s, the voice activity detector of the present invention provides arobust performance in noisy environments.

Preferably, the predetermined length of time is in the range 400 ms to 1s. This has the advantage that the relatively rapidly time-varyingnature of a speech spectrum can be best discriminated from therelatively slowly time-varying nature of a noise spectrum.

Preferably, said spectral irregularity measure calculating means arearranged in operation to calculate a weighted sum of said spectraldifference measures. This has the advantage that, in making aspeech/noise decision, more weight can be given to spectral differencemeasures derived from time intervals over which the difference instationarity between speech spectra and noise spectra is mostpronounced.

According to a second aspect of the present invention there is provideda voice activity detector including:

a voice activity detector according to the first aspect of the presentinvention operable as an auxiliary voice activity detector.

Since the auxiliary noise detector has a high activity, a determinationthat an input signal consists of noise can be relied on to be correct.Furthermore, because the correct functioning of the main voice activitydetector relies on the auxiliary voice activity detector correctlyidentifying a noise signal, a voice activity detector according to thesecond aspect of the present invention makes a reliable determination ofwhether a signal comprises speech or consists only of noise.

According to a third aspect of the present invention there is provided anoise reduction apparatus comprising:

a voice activity detector according to the first aspect of the presentinvention;

means arranged in operation to provide an estimated noise spectrum onthe basis of one or more spectra obtained from respective time segmentsdetermined to consist of noise by said voice activity detector; and

means arranged in operation to subtract said estimated noise spectrumfrom spectra obtained from subsequent time segments of said signal.

It is known by those skilled in the art that the technique of spectralsubtraction only works well if the noise which is to be subtracted fromthe signal to be enhanced is stationary in its nature. This means that acombination of a spectral subtraction device and a voice activitydetector according to the first aspect of the present invention forms aparticularly effective noise reduction apparatus, since the operation ofthe voice activity detector according to the first aspect of the presentinvention means that an input signal will be determined to consist ofnoise only if that noise signal has been largely stationary within thepredetermined length of time.

Generally, any apparatus which requires a reliable noise template willbenefit from the inclusion of a voice activity detector according to thefirst aspect of the present invention.

According to a fourth aspect of the present invention, there is provideda voice activity detector comprising means arranged in operation toextract feature values from an input signal and neural net meansarranged in operation to process a plurality of said feature values tooutput a value indicative of whether said input signal consists ofnoise.

An advantage of this apparatus is that a neural net, once trained, canmodel relationships between the input parameters and the output decisionwhich cannot be easily determined analytically. Although the process oftraining the neural net is labour intensive, once the neural net hasbeen trained, the computational complexity of the algorithm is less thanthat found in known algorithms. This is of course advantageous inrelation to a product such as a voice activity detector which is likelyto be produced in large numbers.

Preferably, the input parameters to the neural net include cepstralcoefficients derived from the signal to be transmitted. It has beenfound that these are useful parameters in making the distinction betweenspeech and noise.

According to a fifth aspect of the present invention there is provided amethod of voice activity detection comprising the steps of:

calculating at least one first spectral difference measure indicative ofthe degree of spectral similarity in a pair of time segments of asignal, one of the time segments of the pair lagging the other by afirst time interval;

calculating at least one second spectral difference measure indicativeof the degree of spectral similarity in a pair of time segments of asignal, one of the time segments of the pair lagging the other by asecond time interval which differs from said first time interval;

calculating a spectral irregularity measure on the basis of at leastsaid first and second spectral difference measure;

comparing said spectral irregularity measure with a threshold measure;and

determining whether said signal consists of noise on the basis of thecomparison.

This method has the advantage that the discrimination between noise andspeech signals is robust.

According to a sixth aspect of the present invention there is provided amethod of enhancing a spectrum representing the value of a spectralcharacteristic at a succession of predetermined frequencies, saidenhancement comprising the steps of:

for each of said predetermined frequencies, comparing the value of saidspectral characteristic at said frequency with the value of saidcharacteristic at neighbouring frequencies and calculating an adjustmentto said predetermined frequency spectral value, said calculation beingsuch that the adjustment is increased on said predetermined frequencyspectral value being greater than either of said neighbouring frequencyspectral values and is decreased on said predetermined frequencyspectral value being less than either of said neighbouring frequencyspectral values; and

adjusting each of said spectral values within the spectrum in accordancewith said calculated adjustment.

BRIEF DESCRIPTION OF THE DRAWINGS

By way of example only, specific embodiments of the present inventionwill now be described in relation to the accompanying drawings, inwhich:

FIG. 1 is a block diagram illustrating the operation of the voiceactivity detector which forms a first embodiment;

FIG. 2 is a block diagram illustrating the operation of the auxiliaryvoice activity detector which forms a component of the voice activitydetector of FIG. 1;

FIG. 3 is a block diagram illustrating the operation of the spectralsubtraction component;

FIG. 4 is a diagram illustrating the operation of the classifiercomponent; and

FIG. 5 is a block diagram of a known voice activity detector.

DETAILED DESCRIPTION OF THE INVENTION

The voice activity detector illustrated in FIG. 1 is arranged for use ina mobile phone apparatus and inputs a signal 19 before carrying out aseries of processes 2,3,4,5,6,7 (each represented as a rectangle) on thesignal in order to arrive at a decision 79 as to whether the inputsignal consists only of noise. At the end of each process 2,3,4,5,6,7 aresultant parameter or parameter set 29,39,49,59,69,79 (each representedas an ellipse) is produced. Each of these processes 2,3,4,5,6,7 can becarried out by a suitable Digital Signal Processing integrated circuit,such as the AT&T DSP32C floating point 32-bit processor.

The input to the voice activity detector is a digital signal 19 whichrepresents voice/information tones and/or noise. The signal 19 isderived from an analogue signal at a rate of 8 kHz and each sample isrepresented by 13 bits. The signal 19 is input to the voice activitydetector in 20 ms frames, each of which consists of 160 samples.

The signal 19 is input into a filterbank process 2 which carries out a256-point Fast Fourier Transform on each input frame. The output of thisprocess 2 is thirty-two frequency band energies 29 which represent theportion of the power in the input signal frame which falls within eachof the thirty-two frequency bands bounded by the following values(frequencies are given in Hz):100,143,188,236,286,340,397,457,520,588,659,735,815,900,990,1085,1186,1292,1405,1525,1625,1786,1928,2078,2237,2406,2584,2774,2974,3186,3410,3648,3900.

The first frequency band therefore extends from 100 Hz to 143 Hz, thesecond from 143 Hz to 188 Hz and so on. It will be seen that the lowerfrequency bands are relatively narrow in comparison to the higherfrequency bands.

The frequency band energies 29 output by the filterbank 2 are input toan auxiliary voice activity detector 3 and to a spectral subtractionprocess 4.

Turning now to FIG. 2, the auxiliary voice activity detector 3 inputsthe frequency band energies 29 and carries out a series of processes31,32,33,34 to provide an auxiliary decision 39 as to whether the signalframe 19 consists only of noise.

The first process used in providing the auxiliary decision 39 is theprocess 31. The process 31 involves taking the logarithm to the base tenof each of the frequency band energies 29 and multiplying the result byten to provide thirty-two frequency band log energies 311. The logenergies from the previous thirty input signal frames are stored in asuitable area of the memory provided on the DSP IC.

The spectral irregularity calculating process 32 initially inputs thelog energies 311 from the current input signal frame 19 together withthe log energies 314, 313, 312 from first, second and third signalframes, respectively occurring thirty frames (i.e. 600 ms), twentyframes (i.e. 400 ms), ten frames (i.e. 200 ms) before the current inputsignal frame. The magnitude of the difference between the log energies311 in each of the frequency bands for the current frame and the logenergies 312 in the corresponding frequency band in the third frame isthen found. The thirty-two difference magnitudes thus obtained are thensummed to obtain a first spectral difference measure. In a similar way,second, third and fourth spectral difference measures are found whichare indicative of the differences between the log energies 313, 312 fromthe second and third frames, the log energies 314, 313 from the firstand second frames and the log energies 314, 311 from the first andcurrent frames respectively. It will be seen that the first, second andthird spectral difference measures are measures of differences betweenframes which are 200 ms apart. The fourth spectral difference measure isa measure of the difference between frames which are 600 ms apart. Thefirst to fourth spectral difference measures are then added together toprovide a spectral irregularity measure 321. The spectral irregularitymeasure therefore reflects both the stationarity of the signal over a200 ms interval and the stationarity of the signal over a 600 msinterval.

Although, in this embodiment, the spectral irregularity measure isformed from a simple sum of the four spectral difference measures, itshould be realised that a weighted addition might be performed instead.For example, the first, second and third spectral difference measurescould be given a greater weighting than the fourth spectral differencemeasure or vice-versa. It will be realised by those skilled in the artthat the effect of having three measures relating to a 200 ms intervaland only one relating to a 600 ms interval is to provide a spectralirregularity measure were more weight is placed on spectral differencesoccurring over the shorter interval.

The spectral irregularity measure 321 is then input to a thresholdingprocess 33 which determines whether the measure 321 exceeds apredetermined constant K. The output of this process is a noisecondition which is true if the measure 321 is less than thepredetermined constant and false otherwise. The noise conditionsobtained on the basis of the previous two frames are stored in asuitable location in memory provided on the DSP IC. The noise conditionis input to the hangover process 34 which outputs an auxiliary decision39 which indicates that the current signal frame consists of noise onlyif the noise condition is found to be true and if the noise conditionwas also true when derived from the previous two frames. Otherwise theauxiliary decision indicates that the current frame comprises speech.

The present inventors have found that the spectral characteristics of asignal which consists of noise change more slowly than the spectralcharacteristics of a signal which comprises speech. In particular, thedifference between the spectral characteristics of a noise signal overan interval of 400 ms to 1 s is significantly less than a correspondingdifference in relation to a speech signal over a similar interval. Theauxiliary voice activity detector (FIG. 2) uses this difference todiscriminate between input signals which consist of noise and thosewhich comprise speech. It is envisaged that such a voice activitydetector could be used in a variety of applications, particularly inrelation to noise reduction techniques where an indication that a signalis currently noise might be needed in order to form a current estimateof a noise signal for subsequent subtraction from an input signal.

Returning to FIG. 1, the auxiliary decision 39 output by the auxiliaryvoice activity detector (FIG. 2) is input to the spectral subtractionprocess 4 together with the frequency band energies 29. The spectralsubtraction process is shown in detail in FIG. 3. Firstly, the frequencyband energies 29 are compressed in the compress process 41 by raisingthem to the power 5/7. The compressed frequency band energies are theninput to the noise template process 42. The compressed frequency bandenergies derived from the current input signal frame N1 and thecompressed frequency band energies N2,N3,N4 derived from the previousthree frames are stored, together with the auxiliary decision relatingto those frames in four fields in memory on the DSP IC. If the currentand the previous three input signal frames have been designated asnoise, the four compressed frequency band energies N1,N2,N3,N4 areaveraged in order to provide a noise template 421.

Each time the noise template 421 is updated, it is inputted to thespectral enhancement process 43. The spectral enhancement processcomprises a number of enhancement stages. The nth stage of enhancementresults in an n-times enhanced spectrum. Hence, the first stage ofenhancement converts an initial noise template to a once-enhanced noisetemplate, which is input to a second stage which provides atwice-enhanced noise template, and so on until at the end of the eighthand final stage an eight-times enhanced noise template results. Eachenhancement stage proceeds as follows.

Firstly, the difference between the compressed energy value relating tothe lowermost (first) frequency band and the compressed energy valuerelating to the second frequency band is calculated. Thereafter, thedifference between the compressed energy value relating to the secondfrequency band and the third frequency band is calculated. Eachcorresponding difference is calculated up until the difference betweenthe thirty-first frequency band and the thirty-second frequency band.These differences are stored in a suitable location in memory on the DSPIC.

In each enhancement stage, the input energy value of each frequency bandof the input noise template is adjusted to increase the differencebetween that energy value and the energy values associated with theneighbouring frequency bands. The differences used in this calculationare those based on the input energy values, rather than the adjustedvalues produced during the current enhancement stage.

In more detail, in each enhancement stage, an adjusted first frequencyband energy value is produced by adjusting the input first frequencyband energy value by 5% of the magnitude of the difference between theinput first frequency band energy value and the input second frequencyband energy value. The adjustment is chosen to be an increase or adecrease so as to be effective to increase the difference between thetwo energy band values. Since the adjustment to the input secondfrequency band energy value depends on two neighbouring frequency bandenergy values, the adjustment is calculated in two steps. Firstly, apart-adjusted second frequency band energy value is produced by carryingout a 5% adjustment on the basis of the difference between the secondand third frequency band energy values. The second part of theadjustment of the second frequency band energy value is then carried outin a similar way on the basis of the difference between the second andthird frequency band energy values. This process is repeated for each ofthe other frequency-bands save for the thirty-second frequency bandenergy value which has only one neighbouring frequency band energyvalue. The adjustment in this case is analogous to the adjustment of thefirst frequency band energy value.

It will be realised that if one of the neighbouring frequency bandenergy values is higher than the frequency band value being adjusted,and the other is lower, then the two parts of the adjustment willcounteract one another.

In a second stage of the spectral enhancement process 43, a similarprocess of adjustment occurs to provide a twice-enhanced noise templateon the basis of the once-enhanced noise template. Once all eightenhancement stages have been carried out, each of the frequency bandenergy values is multiplied by a scaling factor, for example, 0.9. Thepresent inventors have found that the introduction of the spectralenhancement process 43 means that the scaling factor can be reduced froma typical value for noise reduction applications (e.g. 1.1) withoutintroducing a ‘musical’ spectral subtraction noise.

The adjusted noise template 431 output by the spectral enhancementprocess 43 exhibits more pronounced harmonics than are seen in theunmodified noise template 421. In this way, the spectral enhancementprocess 43 models the process known as ‘lateral inhibition’ that occursin the human auditory cortex. This adjustment has been found to improvethe performance of the main voice activity detector (FIG. 1) insituations where the signal-to-background-noise ratio is greater than 10dB.

In the subtraction process 44 the adjusted noise template values 431 aresubtracted from the corresponding values in the frequency bandcompressed energies 411 derived from the current input signal frame toprovide compressed modified energies 441.

The compressed modified energies 441 are then input to a limitingprocess 45 which simply sets any compressed modified energy value whichis less than 1 to 1. Once a lower limit has been introduced in this way,each of the compressed modified energy values is raised in an expansionstep 46 to the power 1.4 (i.e. the reciprocal of the compressionexponent of step 41) to provide the modified frequency band energies 49.

Referring again to FIG. 1, the modified frequency band energies 49 arethen input to a Mel Frequency Cepstral Coefficient calculating process 5which calculates sixteen Mel Frequency Cepstral Coefficients for thecurrent input signal frame on the basis of the modified frequency bandenergies 49 for the current input signal frame.

In a logarithm-taking process 6, similar operations to those carried outin relation to the process 31 are carried out on the modified frequencyband energies 49 to provide logged modified frequency band energies 69.

The classification process 7 is carried out using a fully connectedmultilayer perceptron algorithm. The weights to be used in thisalgorithm are obtained by training the algorithm using aback-propagation algorithm with momentum (α=100, ε=0.05) using 6545frames half of which are noise and half of which are speech. One hundredsamples of training data are presented before each weight update and thetraining data is passed through two hundred times.

Referring to FIG. 4, the multilayer perceptron has forty-eight inputnodes 71. The sixteen Mel Frequency Cepstral Coefficients 59 andthirty-two logged modified frequency band energies 69 are normalised bymeans not shown so as to lie between 0 and 1 before being input torespective input nodes. Each of the input nodes 71 is connected to everyone of twenty primary nodes 73 (only one is labelled in the figure) viaa connection 72 (again, only one is labelled in the figure). Each of theconnections 72 has an associated weighting factor x which is set by thetraining process. The value at each of the primary nodes is calculatedby summing the products of each of the input nodes values and theassociated weighting factor. The value output from each of the primarynodes is obtained by carrying out a non-linear function on the primarynode value. In the present case this non-linear function is a sigmoid.

The output from the each of the primary nodes 73 is connected viaconnections 74 (again, each one has an associated weighting factor) toone of eight secondary nodes 75. The secondary node values arecalculated on the basis of the primary node values using a methodsimilar to that used to calculate the primary node values on the basisof the input node values. The output of the secondary nodes is againmodified using a sigmoid function. Each of the eight secondary nodes 75is connected to the output node 77 via a respective connection 76. Thevalue at the output node is calculated on the basis of the outputs fromthe secondary nodes 75 in a similar way to the way in which thesecondary node values are calculated on the basis of the outputs fromthe primary nodes. The value at the output node is a single floatingpoint value lying between 0 and 1. If this value is greater than 0.5then the decision 79 output by the voice activity detector indicatesthat the current input signal frame comprises speech, otherwise thedecision 79 indicates that the input signal frame consists only ofnoise. It will be realised that the decision 79 forms the output of themain voice activity detector (FIG. 1).

In an alternative embodiment, the multilayer perceptron is provided witha second output node which indicates whether the input signal framecomprises information tones (such as a dial tone, an engaged tone or aDTMF signalling tone).

In order to reduce speech clipping, the output decision may onlyindicate that the input signal frame consists of noise if the outputnode value exceeds 0.5 for the current input signal frame and exceeded0.5 for the previous input signal frame.

In some embodiments, the voice activity detector may be disabled fromoutputting a decision to the effect that an input signal frame consistsof noise for a short initial period (e.g. 1s).

A second embodiment of the present invention provides an improvedversion of auxiliary voice detector defined in the standards document:‘European Digital Cellular Telecommunications (phase 2); Voice ActivityDetector (VAD) (GSM 06.32) ETS 300 580-6’. This corresponds to the VoiceActivity Detector described in our European Patent 0 335 521 which isillustrated in FIG. 5.

Noisy speech signals are received at an input 601. A store 602 containsdata defining an estimate or model of the frequency spectrum of thenoise; a comparison is made (603) between this and the spectrum of thecurrent signal to obtain a measure of similarity which is compared (604)with a threshold value. In order to track changes in the noisecomponent, the noise model is updated from the input only when speech isabsent. Also, the threshold can be adapted (adaptor 606).

In order to ensure that adaptation occurs only during noise-onlyperiods, without the danger of progressive incorrect adaptationfollowing a wrong decision, adaptation is performed under the control ofan auxiliary detector 607, which comprises an unvoiced speech detector608 and a voiced speech detector 609: the detector 607 deems speech tobe present if either of the detectors recognises speech, and suppressesupdating and threshold adaptation of the main detector. The unvoicedspeech detector 608 obtains a set of LPC coefficients for the signal andcompares the autocorrelation function of these coefficients betweensuccessive frame periods, whilst the voiced speech detector 609 examinesvariations in the autocorrelation of the LPC residual.

In the unvoiced speech detector 608, a measure of the spectralstationarity of the signal is used to form the decision as to whetherthe input signal comprises unvoiced speech. More specifically, theinterframe change in a measure of the spectral difference betweenadjacent 80 ms blocks of the input signal is compared to a threshold toproduce a Boolean stationarity decision. The spectral difference measureused is a variant of the Itakura-Saito distortion measure, the spectralrepresentation of each 80 ms block being derived by averaging theautocorrelation functions of the constituent 20 ms frames. The secondembodiment of the present invention improves the reliability of thisdecision.

According to the second embodiment of the present invention, a signalblock to be analysed is divided into a number of sub-blocks, e.g. a 160ms block divided into eight 20 ms sub-blocks. The unvoiced speech/noisedecision is then determined by calculating a spectral distance measurebetween all the combinations of sub-blocks pairs (₈C₂=28 comparisons inthis example), and summing the individual distance measures to form asingle metric. The resultant metric is a measure of the spectralstationarity of the block being analysed. This measure of stationarityis more accurate than the one described in the above-referenced GSMstandard because it considers the spectral similarity between pairs ofsub-blocks, the constituents of which are spaced at different intervals(20 ms, 40 ms, 60 ms . . . 140 ms) rather than just the similaritybetween adjacent blocks. This method could be easily incorporated intothe above GSM VAD, since the variant of Itakura-Saito Distortion Measurecan be calculated from the auto-correlation function available for each20 ms signal frame. It will be realised by those skilled in the art thatother spectral measures, such as FFT based methods, could also be used.Also, a weighted combination of the distortion measures could be used inderiving the single metric referred to above. For example, thedistortion measures could be weighted in proportion to the spacingbetween the sub-blocks used in their derivation.

What is claimed is:
 1. An apparatus comprising: means arranged in operation to calculate at least one first spectral difference measure indicative of the degree of spectral similarity in a pair of time segments of a signal, one of the time segments of the pair lagging the other by a first time interval; means arranged in operation to calculate at least one second spectral difference measure indicative of the degree of spectral similarity in a pair of time segments of the signal, one of the time segments of the pair lagging the other by a second time interval which differs from said first time interval; means arranged in operation to calculate a spectral irregularity measure on the basis of at least said first and second spectral difference measures; means arranged in operation to compare said spectral irregularity measure with a threshold measure; and means arranged in operation to determine whether the signal comprises of noise on the basis on the comparison.
 2. An apparatus according to claim 1 wherein said first and second time intervals are in the range 80 ms to 1 s.
 3. An apparatus according to claim 1 wherein said spectral irregularity measure calculating means is arranged in operation to calculate a weighted sum of said spectral difference measures.
 4. A voice activity detector including an apparatus according to claim 1 operable as an auxiliary voice activity detector.
 5. A voice activity detector according to claim 4 further comprising: means arranged in operation to provide an estimated noise spectrum on the basis of one or more spectra obtained from respective time segments determined to comprise of noise by said auxiliary voice activity detector; and means arranged in operation to subtract said estimated noise spectrum from spectra obtained from subsequent time segments of said signal.
 6. A mobile radio apparatus including an apparatus according to claim
 1. 7. A noise suppression apparatus comprising: means arranged in operation to calculate at least one first spectral difference measure indicative of the degree of spectral similarity in a pair of time segments of a signal, one of the time segments of the pair lagging the other by a first time interval; means arranged in operation to calculate at least one second spectral difference measure indicative of the degree of spectral similarity in a pair of time segments of the signal, one of the time segments of the pair lagging the other by a second time interval which differs from said first time interval; means arranged in operation to calculate a spectral irregularity measure on the basis of at least said first and second spectral difference measures; means arranged in operation to compare said spectral irregularity measure with a threshold measure; means arranged in operation to provide an estimated noise spectrum on the basis of one or more spectra obtained from respective time segments determined to comprise of noise; and means arranged in operation to subtract said estimated noise spectrum from spectra obtained from subsequent time segments of said signal.
 8. A voice activity detector comprising: means arranged in operation to extract feature values from an input signal; neural net means arranged in operation to process a plurality of said feature values to output a value indicative of whether said input signal comprises of noise; means arranged in operation to calculate at least one first spectral difference measure indicative of the degree of spectral similarity in a pair of time segments of a signal, one of the time segments of the pair lagging the other by a first time interval; means arranged in operation to calculate at least one second spectral difference measure indicative of the degree of spectral similarity in a pair of time segments of the signal, one of the time segments of the pair lagging the other by a second time interval which differs from said first time interval; means arranged in operation to calculate a spectral irregularity measure on the basis of at least said first and second spectral difference measures; and means arranged in operation to compare said spectral irregularity measure with a threshold measure; means arranged in operation to provide an estimated noise spectrum on the basis of one or more spectra obtained from respective time segments determined to comprise of noise by said voice activity detector; and means arranged in operation to subtract said estimated noise spectrum from spectra obtained from subsequent time segments of said signal.
 9. A method comprising: calculating at least one first spectral difference measure indicative of the degree of spectral similarity in a pair of time segments of a signal, one of the time segments of the pair lagging the other by a first time interval; calculating at least one second spectral difference measure indicative of the degree of spectral similarity in a pair of time segments of the signal, one of the time segments of the pair lagging the other by a second time interval which differs from said first time interval; calculating a spectral irregularity measure on the basis of at least said first and second spectral difference measures; comparing said spectral irregularity measure with a threshold measure; and determining whether said signal comprises of noise on the basis of the comparison.
 10. A method according to claim 9 wherein said first and second time intervals are in the range 80 ms to 1 s.
 11. A method according to claim 9 wherein said spectral irregularity measure calculation involves forming a weighted sum of said spectral difference measures.
 12. A method of enhancing a spectrum representing the value of a predetermined spectral characteristic at a succession of predetermined frequencies said enhancement comprising the steps of: for each of said predetermined frequencies, comparing the value of said spectral characteristic at said frequency with the value of said characteristic at neighboring frequencies and calculating an adjustment to said predetermined frequency spectral value, said calculation being such that the adjustment is increased on said predetermined frequency spectral value being greater than either of said neighboring frequency spectral values and is decreased on said predetermined frequency spectral value being less than either of said neighboring frequency spectral values; and adjusting each of said spectral values within the spectrum in accordance with said calculated adjustment.
 13. A method according to claim 12 wherein said comparison comprises: obtaining said predetermined frequency spectral value; obtaining the value of said characteristic at an adjacent lower frequency; obtaining the value of said characteristic at an adjacent higher frequency; calculating a downward decrease amount on said predetermined frequency spectral value exceeding said lower frequency spectral value; calculating an upward decrease amount on said predetermined frequency spectral value exceeding said higher frequency spectral value; calculating a downward increase amount on said predetermined frequency spectral value being less than said lower frequency spectral value; calculating an upward increase amount on said predetermined frequency spectral value being less than said higher frequency spectral value; said adjustment calculation being such that said adjustment is increased on the basis of any decrease amount calculated and/or decreased on the basis of any increase amount calculated.
 14. A method according to claim 13 wherein said adjusting step comprises: increasing said predetermined frequency value by an amount linearly proportional to any decrease amount calculated; and/or decreasing said predetermined frequency value by an amount linearly proportional to any increase amount calculated.
 15. A method according to claim 12 comprising repeating all its steps a plurality of times.
 16. A method comprising enhancing a spectrum in accordance with claim
 12. 17. An apparatus comprising: a calculator which calculates a spectrum on the basis of a time segment of the signal and arranged in operation to calculate a first spectrum on the basis of a first time segment of the signal and a second spectrum on the basis of a second time segment of a signal, said second segment lagging said first segment by a predetermined length of time; a calculator which calculates a spectral difference measure between spectra and arranged in operation to calculate a spectral difference measure indicative of the spectral difference between said first and second spectra; a spectral irregularity measure calculator arranged in operation to calculate a spectral irregularity measure on the basis of at least said spectral difference measures; and a comparator which compares said spectral irregularity measure with a threshold measure; wherein said predetermined length of time is sufficiently great to reveal the time-varying character of speech signal spectra; said spectrum calculator is further arranged in operation to calculate one or more intermediate spectra on the basis of the time segments of said signal falling within said predetermined length of time; said spectral difference calculator is further arranged in operation to calculate intermediate spectral difference measures between some or all of said intermediate spectra and said first and second spectra; and said spectral irregularity measure calculator is arranged in operation to calculate the spectral irregularity measure on the basis of said spectral difference measure and said intermediate spectral difference measures. 